SI649 W23 Altair Homework #4¶

Overview¶

We'll focus on maps and cartrographic visualization. In this lab, you will practice:

  • Point Maps
  • Symbol Maps
  • Choropleth maps
  • Interactions with maps

After building these charts, you will make a website with these charts using streamlit.

Lab Instructions¶

  • Save, rename, and submit the ipynb file (use your username in the name).
  • Complete all the checkpoints, to create the required visualization at each cell.
  • Run every cell (do Runtime -> Restart and run all to make sure you have a clean working version), print to pdf, submit the pdf file.
  • If you end up stuck, show us your work by including links (URLs) that you have searched for. You'll get partial credit for showing your work in progress.
In [1]:
import pandas as pd
import altair as alt
from vega_datasets import data

alt.data_transformers.disable_max_rows()

df = pd.read_csv('https://raw.githubusercontent.com/pratik-mangtani/si649-hw/main/airports.csv')
url = "https://raw.githubusercontent.com/pratik-mangtani/si649-hw/main/small-airports.json"
In [2]:
df.head()
Out[2]:
id ident type name latitude_deg longitude_deg elevation_ft continent iso_country iso_region municipality scheduled_service gps_code iata_code local_code home_link wikipedia_link keywords
0 6523 00A heliport Total Rf Heliport 40.070801 -74.933601 11.0 NaN US US-PA Bensalem no 00A NaN 00A NaN NaN NaN
1 323361 00AA small_airport Aero B Ranch Airport 38.704022 -101.473911 3435.0 NaN US US-KS Leoti no 00AA NaN 00AA NaN NaN NaN
2 6524 00AK small_airport Lowell Field 59.947733 -151.692524 450.0 NaN US US-AK Anchor Point no 00AK NaN 00AK NaN NaN NaN
3 6525 00AL small_airport Epps Airpark 34.864799 -86.770302 820.0 NaN US US-AL Harvest no 00AL NaN 00AL NaN NaN NaN
4 506791 00AN small_airport Katmai Lodge Airport 59.093287 -156.456699 80.0 NaN US US-AK King Salmon no 00AN NaN 00AN NaN NaN NaN

Visualization 1: Dot Density Map¶

vis1 Description of the visualization:

We want to visualize the density of small airports in the world. Each small airport is represented by a dot. The visualization has two layers:

  • The base layer shows the outline of the world map.
  • The point map shows different small airports.
  • The tooltip shows the name of the airport.

Hint:

  • How can we show continents on the map? Which object can be used from the json dataset ?
  • How can we show only small airports on the map?
In [3]:
df1 = df[df["type"] == "small_airport"]
In [4]:
# TODO: Vis 1
url = "https://raw.githubusercontent.com/deldersveld/topojson/master/world-continents.json"
source = alt.topo_feature(url, "continent")

base = alt.Chart(source).mark_geoshape(
    fill='lightgray',
    stroke='white'
).project('mercator').properties(
    width=800,
    height=600
)

points = alt.Chart(df1).mark_circle().encode(
    longitude='longitude_deg:Q',
    latitude='latitude_deg:Q',
    size=alt.value(10),
    tooltip='name:N',
    color=alt.value('red')
)

(base + points).properties(title='Small airports in the world').configure_title(fontSize=20)
Out[4]:

Visualization 2: Propotional Symbol¶

vis2 Description of the visualization:

The visualization shows faceted maps pointing the 20 most populous cities in the world by 2100. There are two layers in faceted charts:

  • The base layer shows the map of countries.
  • The second layer shows size encoded points indicating the population of those countries.
  • Tooltip shows city name and population.

Hint:

  • Which projection has been used in individual charts?
  • How to create a faceted chart with different years and 2 columns?
In [5]:
countries_url = data.world_110m.url
source = 'https://raw.githubusercontent.com/pratik-mangtani/si649-hw/main/population_prediction.csv'
In [6]:
df2 = pd.read_csv(source)
In [27]:
base = alt.Chart(alt.topo_feature(countries_url, 'countries')).mark_geoshape(    
    fill='lightgray',
    stroke='white'
).project('naturalEarth1').properties(
    width=600,
    height=450
)
In [28]:
# TODO: Vis 2
point = alt.Chart().mark_circle(color="green").encode(
    longitude = 'lon:Q',
    latitude = 'lat:Q',
    size=alt.Size('population:Q', title='Population (million)', scale=alt.Scale(range=[0, 2000])),
).properties(
    width=600,
    height=450
)

map2 = alt.layer(base, point, data = source).properties(
    width=600,
    height=450
).facet(
    facet='year:N', columns=2
).properties(
    title = 'The 20 Most Populous Cities in the World by 2100'
).configure_title(
    fontSize=20,
)

map2
Out[28]:

Visualization 3: Hurricane Trajectories¶

vis3 Description of the visualization:

Create a map that shows the paths (trajectories) of the 2017 hurricanes. Filter the data so that only 2017 hurricanes are shown. Remove Alaska and Hawaii from the map (Filter out ids 2 and 15).

Hint:

  • How will you filter out 2017 hurricanes?
  • Which object can be used to show state boundaries?
In [9]:
states_url = data.us_10m.url
hurricane_data = pd.read_csv('https://raw.githubusercontent.com/pratik-mangtani/si649-hw/main/hurdat2.csv')
hurricane_data.sample(3)
Out[9]:
identifier name num_pts record_id status latitude longitude max_wind min_pressure datetime
1681 AL021868 UNNAMED 29 NaN TS 33.3 -77.6 50 -999 1868-10-05T12:00:00
11243 AL021907 UNNAMED 23 NaN TD 25.0 -79.0 30 -999 1907-09-18T12:00:00
21999 AL031950 CHARLIE 61 NaN TS 10.8 -37.5 35 -999 1950-08-23T18:00:00
In [10]:
def get_year(row):
    row["year"] = row["datetime"][0:4]
    return row
In [11]:
df3 = hurricane_data.apply(get_year,axis="columns")
In [12]:
df3 = df3[df3["year"] == '2017']
In [13]:
df3.head()
Out[13]:
identifier name num_pts record_id status latitude longitude max_wind min_pressure datetime year
49693 AL012017 ARLENE 27 NaN EX 35.8 -50.3 55 992 2017-04-16T06:00:00 2017
49694 AL012017 ARLENE 27 NaN EX 35.1 -49.5 55 989 2017-04-16T12:00:00 2017
49695 AL012017 ARLENE 27 NaN EX 34.4 -48.7 55 986 2017-04-16T18:00:00 2017
49696 AL012017 ARLENE 27 NaN EX 33.7 -47.8 50 987 2017-04-17T00:00:00 2017
49697 AL012017 ARLENE 27 NaN EX 33.2 -47.0 45 988 2017-04-17T06:00:00 2017
In [14]:
#TODO: Vis 3
states = alt.topo_feature(states_url, 'states')

base = alt.Chart(alt.topo_feature(states_url, 'states')).mark_geoshape(    
    fill='white',
    stroke='black'
).project('mercator').properties(
    width=900,
    height=500
)

map_without_ak_and_hi = base.transform_filter((alt.datum.id != 2) & (alt.datum.id != 15))

hurricane = alt.Chart(df3).mark_line(color="blue").encode(
    longitude='longitude',
    latitude='latitude').project('mercator')

map_without_ak_and_hi + hurricane
Out[14]:
In [ ]:
 

Visualization 4: Choropleth Map¶

vis4

Interaction

vis4

Description of the visualization:

The visualization has a choropleth map showing the population of different states and a sorted bar chart showing the top 15 states by population. These charts are connected using a click interaction.

Hint

  • Which object can be used to show states on the map?
  • Which transform can be used to add population data to the geographic data? How can we combine two datasets in Altair?
In [15]:
state_map = data.us_10m.url
state_pop = data.population_engineers_hurricanes()[['state', 'id', 'population']]
state_pop.sample(5)
Out[15]:
state id population
29 New Hampshire 33 1334795
8 District of Columbia 11 681170
7 Delaware 10 952065
49 Wisconsin 55 5778708
4 California 6 39250017
In [16]:
state_map_top = state_pop.sort_values(by="population", ascending = False).head(15)
In [17]:
state_map_top.head()
Out[17]:
state id population
4 California 6 39250017
43 Texas 48 27862596
9 Florida 12 20612439
32 New York 36 19745289
13 Illinois 17 12801539
In [18]:
#TODO: Vis 4
states = alt.topo_feature(state_map, 'states')

single = alt.selection_single(fields=['state'])

base = alt.Chart(alt.topo_feature(state_map, 'states')).mark_geoshape(    
    stroke='#aaa', strokeWidth=0.25
).transform_lookup(
    lookup='id', from_=alt.LookupData(data=state_pop, key='id', fields=['population','state'])
).add_selection(
    single
).encode(
    color= 'population:Q',
    opacity=alt.condition(single, alt.OpacityValue(1), alt.OpacityValue(0.3)),
).project('albersUsa').properties(
    width=500,
    height=400
)

bars = alt.Chart(state_map_top,title="Top 15 states by population"
).mark_bar().encode(
    x = alt.X('population:Q',
              title = 'population',
              ),
    y = alt.Y('state:N',title = 'state',
            sort=alt.EncodingSortField(
                field = 'population',
                order = 'ascending'
            )),
    color= 'population:Q',
    opacity=alt.condition(single, alt.OpacityValue(1), alt.OpacityValue(0.3)),
).add_selection(
    single
)

base | bars
Out[18]:
In [ ]: